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An Introduction to the DA-T Gibbs Sampler for 
the Two-Parameter Logistic (2PL) Model and 

Beyond 

Gunter Maris & Timo M. Bechger 
Cito (The Netherlands) 

The DA-T Gibbs sampler is proposed by Maris and Maris (2002) as a 
Bayesian estimation method for a wide variety of Item Response Theory 
(IRT) models. The present paper provides an expository account of the DA- 
T Gibbs sampler for the 2PL model. However, the scope is not limited to 
the 2PL model. It is demonstrated how the DA-T Gibbs sampler for the 2PL 
may be used to build, quite easily, Gibbs samplers for other IRT models. 
Furthermore, the paper contains a novel, intuitive derivation of the Gibbs 
sampler and could be read for a graduate course on sampling. 


Introduction 

Let Ypi = 1 denote the event that person p gives the correct answer to 
item i, and 9p his ability. Assume that there exists a latent response variable, 
Xpi, such that person p solves item i if Xpi is larger than a threshold 5i. That 
is, 

p{Yp, = i\ep) = p{Xp,>5,\ep) . 

It is seen that the probability of a correct response depends on the threshold of 
the item as well as the ability of the respondent. The probability of a correct 
response as a function of ability is called the Item Response Function (IRF). 


Address correspondence to: Gunter Maris, Cito, RO. Box 1034, NL-6801 MG, Arnhem, 
The Netherlands. E-mail: gunter.maris[at]citogroep.nl; Tel:H-31-026-3521 162. 
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Figure 1. IRFs for two 2PL items with different parameters. 


Under the Two-Parameter Logistic (2PL) model (Birnbaum, 1968), Xpi 
is assumed to follow a logistie distribution with mean aiOp and seale parame- 
ter /3 = 1 so that 


/ OO 

(xpj > f (^Xpi\0p^ cxijdxpi 

-OO 


/ OO 

(xpj > (5j) 

-OO 


exp(x 


pi 


OLiOp^ 


[1 -h exp {xpi - aiOp)] 


2 dXpi 


_ exp{ai9p - Sj) 

1 -f exp{ai9p — Si) ’ 

where {xpi > Si) denotes an indieator variable that is one if Xpi > Si, and zero 
otherwise. The Two-parameter Normal Ogive (2NO) model (Birnbaum, 1968) 
is obtained when the distribution of the latent response variables is normal. 

The discrimination parameter ai determines how fast the IRF changes 
with ability. If ctj is positive (negative), the probability of answering correctly 
is an increasing (decreasing) function of ability. The Rasch model (Rasch, 
1980) is a special case of the 2PL where all items have a discrimination pa- 
rameter equal to one. 
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As it stands, the 2PL is unidentifiable. Specifically, 


PiXpi = l|^p, Oi, 5i) 


exp(a-^; - S*) 

1 + exp(a*h; - 6*) 


where 

a* = aid, 6* = Si - a^c, 6^* = ^ ^ , 

and c and d are arbitrary constants. To deal with this indeterminacy we arbi- 
trarily set ai = 1, and <5i = 0. This means that the item parameters must be 
interpreted relative to the first item. 

The purpose of this paper is to provide an expository account of 
Bayesian estimation of the 2PL focussing on the DA-T Gibbs sampler devel- 
oped by Maris and Maris (2002). In addition, we offer an intuitive derivation 
of the Gibbs sampler and demonstrate that the DA-T Gibbs sampler for the 
2PL can be used to build Gibbs samplers for many other Item Response The- 
ory (IRT) models. Among others, we consider the Linear Logistic Test Model 
(LLTM) (Fischer, 1995), the 3PL (Bimbaum, 1968), and the Nedelsky model 
for multiple choice items (Bechger, Maris, Verstralen & Verhelst, 2005). 


Gibbs Sampling 

Let A = (Ai, . . . , Am), m > 2, denote a vector of parameters.^ In 
Bayesian statistics, the unknown parameters are considered random variables. 
Bayes theorem states that the posterior density (the posterior, for short) of A 
given the observed data y is given by 


/(^|y) 


/(y|^)/(^) 

/(y) 


where /(y|A) denotes the likelihood function, and /(y) the marginal likeli- 
hood function. The prior density /(A) (prior, for short) expresses substantive 
knowledge concerning the parameters prior to data collection. In Bayesian 
statistics all inferences about the parameters are based upon the posterior. 

'We use subscripts to distinguish parameter vectors from scalars. 
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Second 



Figure 2. Schematic representation of two iterations of the Gibbs sampler with two 
parameters. The plot must be read from upper left to lower right. 


The Gibbs sampler is an iterative proeedure to generate parameter val- 
ues . . . from the posterior. The first n generated values are disearded 

and the rest is eonsidered to be a dependent and identically distributed (did) 
sample from the posterior. This means that 

1 . The distribution of given the data is the posterior for all j > 0. 

2. Conditional upon the data, is not independent of 

In this seetion we diseuss how the Gibbs sampler works, why it works, 
and when it works. Alternative explanations ean be found, for instance, in 
Casella and George (1992), Tanner (1996), or Ross (2003). The reader is 
referred to Tierney (1994) for a more rigorous treatment. 



About the DA-T Gibbs sampler for the 2PL 


331 


How 

The procedure starts by choosing an initial value Then, in each 
successive iteration, individual parameters are sampled independently from 
their so-called full conditional distributions. The order in which the parame- 
ters are sampled is arbitrary. 

The/M// conditional density (the full conditional, for short) is the den- 
sity given the observed data and the current value of all other parameters. In 
the sequel, we will often use the shorthand notation /(Afc| . . . ) for the full 
conditional of parameter A^. To determine, up to a constant, the full condi- 
tional of a parameter A^ we write down the density /(A, y) and remove all 
factors that are unrelated to A^. Specific examples will be given below. 

Figure 2 represents two iterations of a fictitious Gibbs sampler, with two 
parameters being sampled at each iteration. The closed curve represent the 
support of a two-dimensional posterior. The solid lines indicate the support 
of the full conditionals and the crosses denote arbitrary values simulated from 
the different full conditional distributions. Observe that the Gibbs sampler 
“travels” through the support of the posterior along horizontal and vertical 
paths. Note that the support of the posterior must be such that every region 
can be reached by the Gibbs sampler, irrespective of the point of departure. 

With a did sample from the posterior we may use the Monte Carlo 
method to calculate an unbiased estimate of the posterior expectation of any 
function g{\, y): 

/^(A,y)/(A|y)dA — ^^(A^^\y) , 

J ris j 

where Ug denotes the number of sampled values That is, we approxi- 
mate the expectation by the sample mean. The posterior probability that a 
parameter is smaller or equal to a constant t, for example, is estimated by 

^ j 

The variance of the estimator of the posterior expectation can be estimated by 
the variance over independent replications of the Gibbs sampler. 
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iterations 

Figure 3. Plot of sampled values against iterations. 


Unfortunately, there is no established way to determine an appropriate 
value for n. One option is to look at plots of . . . against iterations 

for a number of independent replieations. An illustration with four indepen- 
dent replieations is given in Figure 3. If, after n iterations, the values appear to 
fluetuate around a eommon stationary value, this may be taken as eireumstan- 
tial evidenee that n is large enough. In Figure 3, the plots appear to stabilize 
after about 1200 iterations. However, there is no way to be sure sinee we do 
not know what will happen after 5000 iterations. Other ad hoc methods to 
assess the required number of iterations are surveyed by Gelman and Rubin 
(1992) or Gill (2002, ehapter 11). 
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Why 

Let , n > 0 } denote a Markov chain. A Markov chain is a stochas- 
tic process such that each value depends only on its immediate predecessor; 
that is, for n > 0, 

The Gibbs sampler is a procedure to simulate a Markov chain such that 
the marginal distribution of converges to the posterior if n increases. 
Convergence to the posterior is guaranteed if the following conditions are 
satisfied: 

1. The posterior is the invariant distribution. 

2. The chain is irreducible. 

Invariance means that if A*^°^ is drawn from the posterior, then all sub- 
sequent values are also draws from the posterior. Suppose, for ease of presen- 
tation, that there are two parameters.^ Their posterior density is 

/(Ai, Aaly) = /(Ai|A2,y)/(A2|y) . 

To sample from this posterior, we draw A^^^ from the marginal posterior dis- 
tribution and then A^^^ from the distribution conditional upon A^^\ Note that 
the latter is a full conditional as defined in the previous paragraph. 

Suppose we set up a Markov chain to draw A^^^ from the marginal poste- 
rior distribution. Convergence is faster if the dependence between subsequent 
values is weaker. Thus we aim for a weak degree of dependence. Specifi- 
cally, we ensure that A^^^ and A^°^ are independent and identically distributed 
conditional upon A^'^\ That is, 

/(Af = / /(Af |A™,y)/(AA|Af>,y)/(A<"'|y)A"' ■ 

If we integrate /(A 2 °\ A^^^ly) with respect to A^^^ (or A^^^), we see that A^^^ 
and A 2 ^^ have the same marginal distribution. This distribution is the marginal 

^The argument for the general case follows by mathematical induction. 
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posterior. It follows that 




/(AfV) 

//(A^-'|Ai"'.y)/(Af>|Af.y)<iAS”' . 


To produce a value from the posterior distribution we may use the method 
of composition (Tanner, 1996, section 3.3.2) as follows: 

1 . Draw A® from the posterior. 

2. Draw A^^ from the full conditional / (Ai|A 2 °\ y). 

3. Draw A^^^ from the full conditional / (A 2 |Ai°\ y). 

This procedure is a Gibbs sampler starting with a draw from the posterior. 



Figure 4. Schematic picture of the sampling procedure. Within the rectangle is the 
Gibbs sampler. 
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With drawn from the marginal posterior, we then draw from the 
full conditional /(Ai|A 2 ^\ y) and repeat the process with A^^^ replacing A 2 °\ 
etc. Schematically, the sampling procedure may be depicted as in Figure 4 
where the values generated by the Gibbs sampler are drawn inside a rectangle. 
It can be shown that these values are the realization of a Markov chain whose 
invariant distribution is, by construction, the posterior. The values outside the 
rectangle need not be generated in which case we obtain the Gibbs sampler as 
described in the previous section. 

Irreducibility refers to the fact that it must be possible to reach each 
region in the support of the posterior (see e.g.. Figure 2). This is true for the 
majority of applications. 

When 


Gibbs sampling is useful when the full conditionals are tractable. If 
so, it provides an estimation procedure that can be implemented relatively 
quickly. We call a distribution tractable if there is a simple and efficient 
method to generate a sample from it. Methods for stochastic simulation can 
be found, for instance, in Devroye (1986), Ripley (1997), or Ross (2001). 

There are many situations where the full conditionals are not tractable. 
This is, in fact, the case of the 2PL (Maris & Maris, 2002). In the next section, 
we will demonstrate that the DA-T Gibbs sampler is a variant of the Gibbs 
sampler designed to give tractable full conditionals. 

The DA-T Gibbs Sampler for the 2PL 

The Prior 

We conveniently assume that the parameters are a priori independent. 
That is, 

f(e,S,a)^Jlf(e,)-[[f(S,)f(ai) . ( 1 ) 

V i 

We also assume that all prior distributions are tractable. Note that the priors 
must be chosen relative to the item whose parameters are arbitrarily fixed to 
identify the model. 
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The Full Conditionals 

DA stands for Data Augmentation which entails adding latent data as 
(auxiliary) parameters (Tanner & Wong, 1987). The principle of DA may be 
stated as follows: 

Augment the observed data with latent data so that the augmented 
posterior distribution is “simple”, (e.g., Tanner, 1996, p. 38) 

Here, the continuous latent responses are added as parameters. Our hope is 
that DA will result in tractable full conditionals. Let’s see ! 

The DA posterior of the 2PL is proportional to 

f{9, 5, a, X, y) = /(y |x, 9, 5, a)f{x\9, 5, a)f{9, 5, a) 

= /(y |x, 5)/(x|0, a)f{9, S, a) . 

Persons are assumed to be independent of one another, so that 

/ ( lifxpi> 6i and ypi = 1 

/(y|x, (5) = n n fiVpil^pU 1 dpi = 0 

^ * \ I 0 otherwise 

= n lii^pi > , 

P i 

and 


/(x|0, a) = n n fM^p^ «*) 

P i 


y- j- -j— |- C^idpJ 

[1 + exp{xpi - ai9p)f 


Thus f{9, S, a, x, y) equals 


P i 


< s^Y ypi 


exY>{xpi - Ui9p) 


[1 + exp(a;pi - afip) 


if {0,6, a) 


( 2 ) 


The next step is to derive the full conditionals, including those for the 
latent responses. Let us first consider the full conditional distribution of 6i, 
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( 6. < X . ) =1 

^ I pi ' 





= 1 


intersection = 


min ,{x ,} 
y =1 ^ Pi^ 


max ^{x .} 
y = 0 ^ pi'^ 


y =0 

^pi 


(8.>x ,)=1 

^ I pi ' 


Figure 5. Illustration of the steps taken to arrive at Equation 4. It is illustrated that 
each factor in (3) represents a half open interval, extending either to +oo or — (X), and 
their intersection is a closed interval. 


where i > 1 since the first item location parameter was fixed. If we remove 
from (2) all terms that are unrelated to d*, we find that 


/(5*|...)oc/(5.) 


Hixpi > SiY^^ixpi < SiY 


( 3 ) 


In Equation 3, the term within brackets represents a closed interval. It 
is illustrated in Figure 5 that 




^pi) (,^pi — 

p-Vpi=i p-ypi=o 


6i < min {xpi} I ( max {x pi} < <5*) 
P'-Vpi — ^ J xP'-Upi—^ J 


( max {xpi} < Si < min {xpA 

XP'-Vpi—^ P'-Vpi — ^ 


■ (4) 


Thus, f{Si \ . . . ) is the truncated prior of Si which is tractable. 
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In a similar way, we find that the full eonditionals of a* and 9p are 
proportional to the produet of logistie with prior densities. For instanee, the 
full eonditional of any 9p is 


fiOp 


oc 


/(»,) n 


exp(a; 


pt 




(5) 


[1 + exp(a;pi - ai9p)f 

Unfortunately, the produet of logistie densities is not traetable. We eonelude 
that DA has not simplify the task of sampling from the full eonditional distri- 
butions.^ 

As seen in Equation 5, the problem is due to the faet that the distribution 
of the latent responses depends on the item and person parameters. The DA-T 
Gibbs sampler is obtained if we transform the eontinuous latent responses to 
remove all parameters from their distribution. Henee, the T stands for Trans- 
formation. For the 2PL we apply the transformation Zpi = Xpi — ai9p. The 
resulting “DA-T posterior”, f{9, 6, a, z|y), is proportional to f{9, 6, a, z, y). 
From (2) it is easily found that f{9, 6, a, z, y) equals 


+ > 5^f^^{zp, + Oi^9p < (6) 

P i [l + exp(^pi)] 

Removing unrelated faetors from Equation 6 shows that eaeh of the full eon- 
ditionals is now a traetable truneated distribution: 

1. The full eonditional of Zpi is a logistie distribution with support 


^ Oii9p') i^Zpi ^ 6i OCi9p^ 


1 Vpi 


2. The full eonditional of 6i (i > 1): 


/(5*|...)oc/(5,) 


< Zpi + ai9pY^^ (Si > Zpi -h ai9p)^ 
- p 


3. The full eonditional of ai (i > 1): 


f(aij . . .) oc f(ai) 


JJ (ai9p > Si- ZpiY^^ (ai9p < Si - Zpi)^ 

- p 


^In contrast, a product of normal densities is again a normal density. Thus, DA is effective 
for the 2NO model with normal priors (Albert, 1992; Albert & Chib, 1993). Here, we will 
not consider a method that works only for a particular prior distribution. 
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4. The full conditional of 9^. 

JJ {9pai > Si- Zpif^^ {9pai < Si- ZpiY~^^* 

. i 

In the following paragraph we demonstrate how the support of each full 
conditional is determined. 

Calculating the Truncation Constants 

The support of each of the full conditionals is seen to be a product of 
indicator functions of the following form: 

PJ {Ij < Aj < hj) = ^max{/j} < Aj < min{/ij}^ , (7) 

where either Ij = —oo or hj = oo. Hence, each term < Aj < hj) restricts 
the range of A* to a half open interval extending to either plus or minus in- 
finity. As illustrated in the previous section, their product is the intersection 
of these intervals, ranging from maxj {f} to min^ {hj} (see Figure 5). Thus, 
maxj {Ij} and min^ {hj} are the truncation constants for the full conditional. 


f{ep\...)^f{9p) 


The support for Sp. The support for Si is a product of indicator functions 
over persons. We see that 


and 


Ip 


hp 


[ 

r 

1 


—oo 

if Vpi 

Zpi + OiiOp 

if Vpi 

Zpi T ^i9p 

if Vpi 

oo 

if Vpi 


1 

0 

1 

0 


The support of ap. Note that 

{aiOp > Si- Zpi)^^' (ai9p < Si - Zpi)^~^’’* 

_ I (tpi < ai < oo)^^* (-00 < ai < tpi)^~^^^ if 9p > 0 

{ (-00 < ttj < tpi)^’"" (tpi < if 6 ^p < 0 


( 8 ) 
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where 

^ . (9) 

Up 

The indicator functions depend on the sign of 6p because we divide by 9p on 


both sides of the inequality sign in 

(8). The support for is a product over 

persons. If 9 p > 0, 





1 

ip = j 

u . 

1 

1 

if dpi = 1 

and hp = < 

1^ oo 

j 

if dpi = 1 

[— CX) 

if ypi = 0 


V 

if dpi ~ 0 

If < 0, then 






1 

[— CX) 
1 

if Vpi = 1 

and hp = < 

(f 

I 

if dpi = 1 

Kj'pi 

if Vpi = 0 


[oo 

if dpi 0 

The support of 9p-. 

Calculating the support for 9p 

is very similar to calcu- 

lating the support of 

QCj. The difference is that here we have a product over 

items. Let 


tpi — 

^i ^pi 


(10) 







QCj 



If tti > 0, then 






1 

k = \ 

(t . 
1 

if Vpi = 1 

1 

and hi = < 

f 

OO 

if ypi 1 

1 

[— CX 

if ypi 0 

1 

^tpi 

if ypi 0 

If ai < 0, 






1 

k = \ 

[— CX 
1 

if Vpi = 1 

1 

and hi = < 

f 

tpi 

if ypi = 1 

1 

\^^pi 

if ypi 0 

1 

^OO 

if ypi = 0 


In practice, we consider each interval in (7) separately and increase 
(decrease) the lower bound (upper bound) of the intersection, each time we 
encounter an interval with a higher lower bound (lower upper bound). This 
is illustrated with the following pseudo-code description of an algorithm to 
determine the truncation constants for the full conditional of 6p\ 
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I = —oo 
h = oo 

FOR i = 1 to the number of items 

7. ^pi 

IF I/p* = 0 

IF tpi < h and a* > 0 then h = tpi 
IF tpi > I and a* < 0 then I = tpi 

IF Hpi = 1 

IF tpi > I and a* > 0 then I = tpi 
IF tpi < h and a* < 0 then h = tpi 

END 

It is clear that the DA-T Gibbs sampler stops if any of the intersections 
is empty. In the next paragraph it is shown that this will never happen. 

Could any of the Intersections be Empty? 

For any parameter values at the jth iteration, we generate latent data 
such that 

nn [(4f ■’ > < itf - = 1 . 

i P 

This means that, at this point, we are inside the support of the posterior. Then, 
we draw, say, 5* from 

/(i.i . . . ) a [n(4^"’ + “44"’ > + “44-'’ < ■SO'-*'"'] f(s.) 

- p 

Since the term within square brackets is one for 5* = it follows that the 
support of the full conditional is not empty. The same is true for the other 
parameters. It follows that none of the intersections can be empty. 

Sampling from a Truncated Distribution 

Let X denote a random variable with distribution function F. We wish 
to generate a realization of X under the condition that X takes values in the 
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Figure 6. Simulating from a truncated distribution 


interval {I < X < h). Figure 6 illustrates how this may be done. First, we 
draw u from the uniform distribution. We then transform u to 

u- = {u[F{h)-F{l)] + F(l)} 

which lies in the interval from F{1) to F{h). The value F~^{u*) is a realiza- 
tion of the truncated variable. 

Estimating Under Restrictions 

Researchers often hold prior ideas about the parameters that take the 
form of order restrictions on the parameters. They may, for instance, believe 
item 1 to be easier than item 2. Thus, the prior density becomes 

fie,6,a){6,<6,) . 


Each such restriction is added to the range restrictions of the full conditionals. 
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Handling Incomplete Designs 

In applications, the design of the study is often incomplete. This means 
that only a subset of the available items is administered to eaeh person, and 
no responses are observed for items that were not administered. To adapt the 
Gibbs sampler to handle data eolleeted in an ineomplete design we need only 
ignore, for each person, the items that were not administered. 

Linear Logistic Test Models 

In this seetion we demonstrate how the DA-T Gibbs sampler for the 
2PL is adapted to estimate the parameters of the Linear Logistic Test Model 
(LLTM) (Fiseher, 1995 and referenees therein). 

Assume that the Raseh model is valid. That is, all diserimination param- 
eters are unit eonstants. The LLTM speeifies each item diffieulty parameter 
as a linear eombination of so-ealled basie parameters: 

Hv) = QiiVi + Qi2d2 H h qikVk ■ (11) 

For ease of presentation, we assume that pj refers to the diffieulty of a mental 
operation, and qij to the number of times this operation is required for the f-th 
item. Thus, the weights qij are non-negative integers. 

The DA-T Gibbs sampler for the LLTM differs very little from that of 
the Rasch model. Instead of sampling the item diffieulties we now sample the 
basie parameter. If we replaee, in the DA-T posterior of the Raseh model (6), 
6i by Si{p), it is easy to derive that 

fiVjl • • • ) oc fivj) n n (Vj < iVj > 

where denotes the product over all items that require the j-th mental 

operation, and 

^pi T qihVh 

Zpi = 

The further speeifieation of the DA-T Gibbs sampler is only marginally dif- 
ferent from the DA-T Gibbs sampler for the 2PL. 
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For illustration, we analyse a small data set that was published by Rost 
(1996, pp. 99-100).'^ The data set consists of the responses of 300 persons to 
five geometrical analogy items. Rost (1996, section 3.4) considers the follow- 
ing weights appropriate for these five items: 


Q 


^ 10 ^ 
2 0 
1 1 
2 1 

\2 2 ) 


Thus, q 2 i = 2, g 32 = 1, etc. In this case, the weights are such that the basic 
parameters are uniquely determined (see Fischer, 1995). 

We assume that the persons are a simple random sample from a nor- 
mal population with mean /r and variance The population parameters are 
estimated with the other parameters. The full conditional of /i is 


/(/i|...) oc /(/i) 


.pi 


where tpi = QijVj ~ ^pi ~ Vp^^ rjp = {9p — jj)/ a. The full conditional 
of a is 


/(a|...) oc/(a) 


(a > o)nri(^p^ > — ZpiY^^{r]pa < Si — ZpiY 

P i 


The other full conditionals are unchanged, but 6p is replaced by rjpa + /i. The 
details are in Maris and Maris (2002, section 2.3.4). 

We do not presume to know very much about the parameters and use 
zero-mean logistic priors with a large variance. After a burn-in period of 
200, 000 iterations we had the program run for a few days to do several million 
iterations. The posterior means and standard deviations are in the following 

"^Previous (non-Bayesian) analyses on the same data are reported by Rost (1996), and 
Bechger, Verstralen, and Verhelst (2002, section 6). 



About the DA-T Gibbs sampler for the 2PL 


345 




Figure 7. Summary of sampled values of the first basic parameter following a burn- 
in period. The first plot shows the running mean over iterations. The second plot 
shows sampled values. The line running through the sampled values is again the 
running mean. 


table: 

hi rj 2 F cr 

posterior mean 0.463 0.969 1.361 1.993 

posterior stand, dev. 0.149 0.103 0.270 0.157 

The first mental operation was more diffieult than the seeond in over 99% of 
the sampled values. 

Figure 7 shows two plots of the running mean of the sampled values 
of T]i. The upper drawing suggests that the ehain has not eonverged after the 
burn-in period. After an initial phase of erratie behaviour, the running mean 
is seen to move downwards stabilizing after about 3, 500, 000 iterations. In 
the lower plot, however, it is seen that the variation in the running mean is 
negligible on the seale of the sampled values. 
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2PL Mixture IRT Models 

A 2PL Mixture Model (2PLMM) is an IRT model that can be written 
as: 


piY^ = 3\e, X) = Y. nyp^ = J|S = s, A,|,)P(S = s\e, a,) , 

5 

where: 

1. S = (S'!, . . . , Sk) denotes a vector of discrete latent item responses. 

2. l^j|S = s follows a multinomial distribution. 

3. P(S = s|0, A^) is the likelihood of k locally independent 2PL items. 

4. 9 may be multi-dimensional. 

2PLMMs are defined by restrictions on the distribution of Ypi given 
S = s. Consider, for example, the 3PL. In the 3PL, k = 1, and 

^ I 1 if a person knows the correct answer 
I 0 if he doesn’t know the correct answer 

Consequently, 

P{Yip = IIS' = 1, \y\s) = 1 and P{Yip = IjS* = 0, \y\s) = A^i^ . 

In latent response models (Maris, 1995), 6 is multi-dimensional but the prob- 
abilities PiYpi = j|S' = s) are known and equal to zero or one. An example 
is the conjunctive Rasch model (see Maris & Maris, 2002, section 2.3.2). 

The DA-T Gibbs sampler for the 2PL can be used to build a Gibbs 
sampler for any 2PLMM. Specifically, at each iteration we draw a sample 
from the posterior 

/(0,A,s|y) oc /(0, A,s,y) . 

in three steps: 

1. Generate latent discrete item responses from / (sj^, A, y). 

2. Generate 9 and As from f{9, As|s). 

3. Generate \y\s from /(Aj^|s|s, y). 
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Due to LI, step 1 entails generating independent responses to each of k 
2PL items for each of the persons. Step 2 can be done using the DA-T Gibbs 
sampler for the 2PL. Step 3 is the most complicated step. It is relatively simple 
if the prior of is taken to be a truncated Dirichlet distribution because this 
implies that the corresponding full conditional is also a truncated Dirichlet 
distribution. In the 3PL, for instance, the full conditional of the guessing 
parameter would then be a truncated /^-distribution. In latent response models, 
step 3 is unnecessary because Aj,|^ is known. As an illustration, we construct 
a Gibbs sampler for the Nedelsky model. 

The Nedelsky Model for Multiple-Choice Items 


Consider a multiple-choice (MC) item i with Ji + 1 options arbitrarily 
indexed 0, 1, ... , J*. For convenience, 0 indexes the only correct alternative. 
The Nedelsky Model (NM) is based upon the idea that a person responds to 
a MC question by first eliminating the incorrect answers (or distractors) he 
recognizes as wrong and then guesses at random from the remaining answers. 

The probability that wrong answer j is recognized as wrong by a re- 
spondent with ability Op is modelled as a 2PL. That is, for j = 1, . . . , Jj, 


= l\0p) 


eycp^ajOp - Sjj) 

1 + exp(ai0p - dij) 


where Sij denotes a random variable that indicates whether alternative j is 
recognized to be wrong. Thus we may think of each distractor as a 2PL item. 
A “correct” answer is produced if the distractor is seen to be wrong. We will 
now assume that the discrimination parameter is positive. 

Define a latent subset Sj by the vector (0, . . . , Sijf). Assuming 

independence among the options given 9, the probability that a subject with 
ability Op chooses any latent subset Sj is given by 


p(s, = s,\0p) = n 


i=i 


exp{ai0p - 6ijY^i 
1 -f exp{ai0p - Sij) 


exp {aiOpsf - E/ki SijSij) 

YljLi [1 + exp{ai9p - %)' 
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where sf = Y.f=i Sij denotes the number of distractors that are recognized as 
wrong. 

Once a latent subset is chosen, a respondent guesses at random from 
the remaining answers. Thus, the conditional probability of responding with 
option j to item i, given latent subset Sj, is given by: 


P(F, =j|S, = s,) = 


E^Lo(1 - Sih) 


where ~ ^ih) denotes the number of alternatives to choose from. 



Figure 8. The item response funetion P{Yi = j\6) (with a* > 0) against 0 for a 
Nedelsky item with five eategories. 



1 

Tj + 1 


(for j = 0, ..., Ji). 
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In fact, if an item has only two answer categories (wrong and correct), the 
NM equals the 3PL with the parameter in the latter model fixed at 

We will now derive a DA-T Gibbs sampler for the NM. Let s denote the 
latent subsets, and Sjp the latent subset of respondent p on answering the fth 
item. The vector 6 contains the abilities, and the vector 5 the parameters of 
the items. The parameters of item i are denoted by <5* = (oj, <5^1, ... , 

We proceed by drawing a sample from f{9, 6, s|y) and then ignore the 
latent subsets. To this aim, we consider two full conditionals: f{9, <5|y, s) = 
f{9, (i|s) and f{s\9, 6, y), and repeat the following steps: 

1. Draw latent subsets from f(sj9,S,y). 

2. Draw 9 and S from f{9, <5|s) using the Gibbs sampler for the 2PL. 

Using LI and Bayes theorem it is seen that. 


f{s\9,S,y) = nn 

P i 


Esi^(ypiisi)P(sii9p,Si) 


U n ^i^ipl^pi Vpi) 

P i 


Hence, sampling from f{s\9, S, y) entails independently drawing NpNj latent 
subsets. To this aim, we make a list of all 2'^* subsets and calculate for each 
subset on the list the probability 


P(^Sj\9p, Si, Ppi) 


P{ypi\sfP{sj\9p,5i) 

EsiP{ypMi)P{^i\Op,Si) 


where j = 1, . . . , 2'^' and 


P{jJpi\^j)P{,^j\^P) Si) OC 


— r — ^ , exp 


ai9s1: 


^ ^ SjkSik 
k=l 


With these probabilities we then choose a random subset from the list (see 
e.g., Ross, 2003, section 11.4). 

Note that the NM has many parameters and hence a large number of 
persons is required to estimate the item parameters with reasonable precision. 
As an illustration we provide, in Figure 9, recovery plots of true values against 
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Location Parameters 



Discrimination Parameters 



Figure 9. A typical recovery plot for an analysis with 20 items and 200 persons. 
Generating parameter values are on the horizontal axes. Estimated posterior means 
are on the vertical axes. 


estimated posterior means, for a (small) data set with 20 trichotomous items 
and 200 persons. That is, we have simulated data under the NM, estimated the 
parameters, and plotted the parameter values used to generate the data against 
the posterior means. It is seen that recovery is not particularly good. 

Discussion 

In this article we have given an expository account of the DA-T Gibbs 
sampler for the 2PL. In addition, we have illustrated how the DA-T Gibbs 
sampler for the 2PL is extended to estimate models that are a special case of 
the 2PL (LLTM), or used as a building block to construct samplers for more 
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complex models (2PLMMs). Further applications can be found in Maris and 
Maris (2002). 

The DA-T Gibbs sampler is simple to implement but may be slow to 
converge. Especially with large, returning applications, the algorithm may 
need to run longer than we can afford to wait so that it makes sense to invest 
time in developing and programming a more efficient (sampling) algorithm 
(e.g.. Chib & Greenberg, 1995). 

Our focus has been on Gibbs sampling. As a consequence, a number of 
important issues where ignored or have only been mentioned in passing. For 
more information on Bayesian theory and methods, we refer to general text- 
books, such as Bernardo and Smith (1994), Chen, Shao, and Ibrahim (2000), 

Gelman, Carlin, Stern, and Rubin (1995), Gill (2002), or Tanner, (1996). 
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